There are folium maps in this notebook. If they do not display well, please have a look at the .html file from the same repository.
Our main goal in this project is to find social and economic world-wide relations of countries based on the state of their agricultural sector, using indicators such as imports, exports, production, self-sufficiency, etc. In order to find such indicators, we would like to use the data from the "Global Food & Agriculture Statistics" datasets. First, we would like to produce a map showing which countries "feed the world" i.e. which countries are net-exporting food. That map would contain a slider to show how this evolved over the past fifty years. Then we would like to show countrywise the level of food self-sufficiency i.e. the way a country does not need to trade with other ones in order to feed its population. We will also compare it to nations' economic development and see if some correlations can be found. We will visualize our best findings with several interactive maps and plots.
We would like to work on the following research questions:
External imports:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import folium
import seaborn as sns
import json
import re
import requests
from bs4 import BeautifulSoup
from ipywidgets import interact
from IPython.display import display
Auxiliary function imports:
We have implemented some functions into a dedicated module (file Milestone_2_scripts.py) in order to simplify the code and make this notebook more enjoyable to read.
#from Milestone_2_scripts import *
Setup:
data_folder_path = "./Data/current_FAO/raw_files/"
files = {"Crops production" : "Production_Crops_E_All_Data_(Normalized).csv",
"Crops trade" : "Trade_Crops_Livestock_E_All_Data_(Normalized).csv",
"Consumer price indices" : "ConsumerPriceIndices_E_All_Data_(Normalized).csv",
"Macroeconomy" : "Macro-Statistics_Key_Indicators_E_All_Data_(Normalized).csv",
"Livestock production" : "Production_Livestock_E_All_Data_(Normalized).csv",
"Live animals trade" : "Trade_LiveAnimals_E_All_Data_(Normalized).csv"
}
interesting_datasets = files.keys()
In this part, we will load, explore and clean the dataset in order to remove typing errors, missing information, inaccuracies, and so on.
Extracting crops harvested area, production, seed and yield from the "Crops production" dataset
Extracting stocks production from the "Livestock production" dataset
Extracting import and export quantities from the "Live animals trade" and "Crops trade" datasets
Extracting average CPI of each year from the "Consumer price indices" dataset
Our main dataset would be a subset of the "Global Food & Agriculture Statistics" that is found in the proposed datasets list. In this dataset, we have seen that we could work with the production as well as import and export quantities per year and per country. We will add information about countries GDP to this database.
def load_datasets(datasets) :
df = {}
for dataset in datasets :
file_path = data_folder_path + files[dataset]
df[dataset] = pd.read_csv(file_path, encoding = "ISO-8859-1")
return df
We load each interresting dataset in the dictionary df :
df = load_datasets(interesting_datasets)
In this part, we will have a first look of the datasets in order to get a first sense of the data.
def display_df(df, datasets):
for dataset in datasets :
display(dataset, df[dataset].sample(5))
In order to see what does the datasets look like, we display a sample of 5 rows for each of them :
display_df(df, interesting_datasets)
At first glance, our datasets seem very clean.
Each of our dataset contains a column "Year" and a column that is either named "Area" or "Country". This is a great news for us since we want to do a both geographical and time-related analysis.
The columns "Area" and "Country" both correspond to the country except that the "Area" may contains a group of country (e.g. "Eastern Europe").
In this part, we will clean the datasets. The final goal is to produce one uniformized and normalized dataset on which we could work (see 1.F).
Such a cleaned dataset may look like this (in a very simplistic way):
Country | Year | GDP | Crops production | Livestock production
In this section, we will create dataframes in df_useful which correspond to previous dataframes without the unuseful data.
df_useful = {}
def extract_GDP(df):
def selection_GDP(df):
return df['Item']=='Gross Domestic Product'
def selection_US_dollars(df):
return df['Element']=="Value US$"
def drop_columns(df):
dropped_colmuns = ["Item Code", "Item", "Element Code", "Element", "Flag", "Year Code", "Unit"]
return df.drop(columns = dropped_colmuns)
return drop_columns(df[selection_GDP(df)&selection_US_dollars(df)])
df_useful["GDP"] = extract_GDP(df["Macroeconomy"])
display(df_useful["GDP"].sample(5))
select_switzerland = df_useful["GDP"]['Area']=='Switzerland'
select_france = df_useful["GDP"]['Area']=='France'
select_austria = df_useful["GDP"]['Area']=='Austria'
select_canada = df_useful["GDP"]['Area']=='Canada'
ax = df_useful["GDP"][select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
select_USSR = df_useful["GDP"]['Area']=='USSR'
select_russia = df_useful["GDP"]['Area']=='Russian Federation'
select_ukraine = df_useful["GDP"]['Area']=='Ukraine'
ax = df_useful["GDP"][select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
def get_food_crops():
#Return a list of crops categorized as food crops https://world-crops.com/food-crops/
url="https://world-crops.com/food-crops/"
r=requests.get(url,headers={"User-Agent": "XY"})
soup=BeautifulSoup(r.text,'html.parser')
elements_temp=soup.find_all('a',href=re.compile("^../"))
elements=[el.text for el in elements_temp]
#only 40 elements are displayed on each page->iterating on the total list
for i in range(40,401,40):
url_i=url+"?ss="+str(i)
r=requests.get(url_i,headers={"User-Agent":"XY"})
soup=BeautifulSoup(r.text,'html.parser')
new_elements=soup.find_all('a',href=re.compile("^../"))
elements+=[el.text for el in new_elements]
return elements
def inclusive_search(string,elements):
#returns true if the string can be found in elements. The search removes special characters from string in order to include more positive results
string=string.lower()
delimiters = ",", "(","&",")"," and "," "
pattern = '|'.join(map(re.escape, delimiters))
strings=list(filter(None,re.split(pattern,string)))
found=False
for s in strings:
if s=="nes":
continue
for el in elements:
found=(s in el.split())
if found==False and s[-1]=="s":
found=s[:-1] in el.split()
if found==False and s[-2:]=="es":
found=s[:-2] in el.split()
if found==False and s[-3:]=="ies":
found=s[:-3]+"y" in el.split()
if found==True:
return found
return found
def get_food_crop_data(df):
#extracts the food crop data, returns 4 df: Area,Production,Seed and yield
df=df.copy()
food_crops=list(map(lambda x: x.lower(),get_food_crops()))
crop_types_df=df[['Item','Value']].groupby('Item').sum()
crop_types_df=crop_types_df[list(map(lambda x : inclusive_search(x,food_crops) , crop_types_df.index ))]
food_crop_df=df[df.Item.apply(lambda x: x in crop_types_df.index)]
return (food_crop_df[food_crop_df.Element=='Area harvested'],
food_crop_df[food_crop_df.Element=='Production'],
food_crop_df[food_crop_df.Element=='Seed'],
food_crop_df[food_crop_df.Element=='Yield'])
food_crop_area_df , food_crop_production_df , food_crop_seed_df , food_crop_yield_df = get_food_crop_data(df["Crops production"])
df_useful['Crops Area harvested'] = food_crop_area_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Production'] = food_crop_production_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Seed'] = food_crop_seed_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops Yield'] = food_crop_yield_df.drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
display(df_useful['Crops Area harvested'].sample(5))
display(df_useful['Crops Production'].sample(5))
display(df_useful['Crops Seed'].sample(5))
display(df_useful['Crops Yield'].sample(5))
select_Maize = df_useful['Crops Area harvested']['Item']=='Maize'
maize_df = df_useful['Crops Area harvested'][select_Maize]
select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
selection_stocks = df['Livestock production']["Element"] == 'Stocks'
df_useful['Livestock production'] = df['Livestock production'][selection_stocks].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
display(df_useful['Livestock production'].sample(5))
select_pigs = df_useful['Livestock production']['Item']=='Pigs'
pigs_df = df_useful['Livestock production'][select_pigs]
select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
selection_import_quantities = df['Live animals trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Live animals trade']["Element"] == 'Export Quantity'
df_useful['Live animals import quantities'] = df['Live animals trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Live animals export quantities'] = df['Live animals trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
display(df_useful['Live animals import quantities'].sample(5))
select_pigs = df_useful['Live animals import quantities']['Item']=='Pigs'
pigs_df = df_useful['Live animals import quantities'][select_pigs]
select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
display(df_useful['Live animals export quantities'].sample(5))
select_pigs = df_useful['Live animals export quantities']['Item']=='Pigs'
pigs_df = df_useful['Live animals export quantities'][select_pigs]
select_switzerland = pigs_df['Area']=='Switzerland'
select_france = pigs_df['Area']=='France'
select_austria = pigs_df['Area']=='Austria'
select_canada = pigs_df['Area']=='Canada'
ax = pigs_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
select_USSR = pigs_df['Area']=='USSR'
select_russia = pigs_df['Area']=='Russian Federation'
select_ukraine = pigs_df['Area']=='Ukraine'
ax = pigs_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = pigs_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = pigs_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
selection_import_quantities = df['Crops trade']["Element"] == 'Import Quantity'
selection_export_quantities = df['Crops trade']["Element"] == 'Export Quantity'
df_useful['Crops import quantities'] = df['Crops trade'][selection_import_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
df_useful['Crops export quantities'] = df['Crops trade'][selection_export_quantities].drop(columns=['Item Code', "Element Code", "Element", "Year Code", "Flag"])
display(df_useful['Crops import quantities'].sample(5))
select_Maize = df_useful['Crops import quantities']['Item']=='Maize'
maize_df = df_useful['Crops import quantities'][select_Maize]
select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
display(df_useful['Crops export quantities'].sample(5))
select_Maize = df_useful['Crops export quantities']['Item']=='Maize'
maize_df = df_useful['Crops export quantities'][select_Maize]
select_switzerland = maize_df['Area']=='Switzerland'
select_france = maize_df['Area']=='France'
select_austria = maize_df['Area']=='Austria'
select_canada = maize_df['Area']=='Canada'
ax = maize_df[select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
select_USSR = maize_df['Area']=='USSR'
select_russia = maize_df['Area']=='Russian Federation'
select_ukraine = maize_df['Area']=='Ukraine'
ax = maize_df[select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = maize_df[select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = maize_df[select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
df_useful['Consumer price indices'] = df['Consumer price indices'][['Area',"Year",'Value']] \
.dropna() \
.groupby(['Area',"Year"]) \
.mean() \
.reset_index() \
.dropna()
display(df_useful['Consumer price indices'].sample(5))
select_switzerland = df_useful['Consumer price indices']['Area']=='Switzerland'
select_france = df_useful['Consumer price indices']['Area']=='France'
select_austria = df_useful['Consumer price indices']['Area']=='Austria'
select_canada = df_useful['Consumer price indices']['Area']=='Canada'
ax = df_useful['Consumer price indices'][select_switzerland].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful['Consumer price indices'][select_france].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful['Consumer price indices'][select_austria].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful['Consumer price indices'][select_canada].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["Switzerland", 'France', 'Austria', "Canada"])
select_russia = df_useful["Consumer price indices"]['Area']=='Russian Federation'
select_ukraine = df_useful["Consumer price indices"]['Area']=='Ukraine'
ax = df_useful["Consumer price indices"][select_russia].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["Consumer price indices"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(['Russia', 'Ukraine'])
Having a more detailled look at the dataset, we have remarked that the areas which are real countries are exactely the ones with an "Area Code" below 5000.
#remove Area code >= 5000
for df_name in df_useful :
if 'Area Code' in df_useful[df_name].keys() :
print ("Removing areas which are not countries in", df_name)
selection_countries = df_useful[df_name]['Area Code']<5000
df_useful[df_name] = df_useful[df_name][selection_countries]
display(df_useful[df_name].sample(5))
else :
print (df_name, "is already clean")
In this section, we will explain how we will handle the missing data in previous dataframes for maps.
select_USSR = df_useful["GDP"]['Area']=='USSR'
select_russia = df_useful["GDP"]['Area']=='Russian Federation'
select_ukraine = df_useful["GDP"]['Area']=='Ukraine'
ax = df_useful["GDP"][select_USSR].plot(x ='Year', y='Value', kind = 'line')
ax = df_useful["GDP"][select_russia].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax = df_useful["GDP"][select_ukraine].plot(x ='Year', y='Value', kind = 'line', ax = ax)
ax.legend(["USSR", 'Russia', 'Ukraine'])
In order to vizualize folium maps, we need to associate each country a value. The geojson file that we use is not timestamped and only countries that exist nowadays are inside it. As some countries has been dissolved during the past 50 years, our folium maps won't be complete. For instance, we do not have any value for Ukraine from 1970 to 1989. Our idea to fix this issue is presented in the next paragraph.
Our idea is to map the former country value to each of the current ones. For instance in 1982, USSR GDP is around one trillion $. Therefore, if we associate (only for folium map purposes) this value to each current country that succeeded USSR, all these countries will appear the same color in the folium map, i.e. all the USSR area will appear the same color (and the good one).
In order to do so, one need to identify which countries appeared and disappeared from the dataset and at which year. Then we will use this result along with some historical research in our visualise_world_data_folium function (1Ea).
countries_formation_years = {}
for country in df_useful["GDP"]["Area"].unique():
selection = df_useful["GDP"]["Area"] == country
year_in, year_out = df_useful["GDP"][selection].dropna()["Year"].min(), df_useful["GDP"][selection].dropna()["Year"].max()
for year in (year_in, year_out):
if year not in countries_formation_years :
countries_formation_years[year] = []
countries_formation_years[year_in].append((country,'+'))
countries_formation_years[year_out].append((country,'-'))
countries_formation_years.pop(1970)
countries_formation_years.pop(2015)
for year in sorted(list(countries_formation_years)):
print (year, countries_formation_years[year])
In this part, we will finish prepocessing the datasets. More precisely, we will deal with country names and normalizing the features.
dic = {'Czechia': "Czech Republic",
'Russian Federation':'Russia',
"Serbia":"Republic of Serbia",
'The former Yugoslav Republic of Macedonia':'Macedonia',
'China, mainland':'China',
'Viet Nam':'Vietnam',
'Venezuela (Bolivarian Republic of)':'Venezuela',
'Iran (Islamic Republic of)':'Iran',
'Syrian Arab Republic':"Syria",
'Bolivia (Plurinational State of)': 'Bolivia',
"Côte d'Ivoire": "Ivory Coast",
'Congo':"Republic of the Congo",
"Lao People's Democratic Republic":'Laos',
"Democratic People's Republic of Korea":"North Korea",
'Republic of Korea':"South Korea"}
def correct_country_names(old_name):
if old_name in dic.keys() :
return dic[old_name]
return old_name
for df_name in df_useful :
print (df_name)
df_useful[df_name]["Area"] = df_useful[df_name]["Area"].apply(correct_country_names)
def visualise_world_data_folium(df, year, logScale=True):
dic = {'USSR': ['Armenia', 'Azerbaijan','Belarus', 'Estonia', 'Georgia',
'Kazakhstan', 'Kyrgyzstan', 'Latvia', 'Lithuania',
'Montenegro', 'Republic of Moldova', 'Russia',
'Republic of Serbia', 'Timor-Leste', 'Turkmenistan', 'Ukraine',
'Uzbekistan'],
'Ethiopia PDR': ['Eritrea','Ethiopia'],
'Yugoslav SFR': ['Kosovo', 'Slovenia', 'Croatia',
'Macedonia', 'Bosnia and Herzegovina'],
'Yemen Dem' : ['Yemen'],
'Czechoslovakia': ["Czech Republic", 'Slovakia'],
'Netherlands Antilles (former)': ['Curaçao', 'Sint Maarten (Dutch Part)'],
'Sudan (former)': ['South Sudan', 'Sudan']
}
def add_new_names(old_name):
if old_name in dic.keys() :
return dic[old_name]
return old_name
to_plot=df[df["Year"]==year]
to_plot=(to_plot[['Area','Value']]
.dropna()
.groupby('Area')
.mean()
.reset_index()
.dropna())
to_plot['Area']=to_plot['Area'].apply(add_new_names)
to_plot = to_plot.explode('Area')
if logScale :
to_plot.Value=np.log10(to_plot.Value)
m = folium.Map(location=[40,-10],zoom_start=1.6)
folium.Choropleth(
geo_data=f"https://raw.githubusercontent.com/python-visualization/folium/master/examples/data/world-countries.json",
data=to_plot,
columns=['Area', 'Value'],
key_on='feature.properties.name',
fill_color='YlGn',fill_opacity=0.7,line_opacity=0.2,nan_fill_opacity=0.0
).add_to(m)
folium.LayerControl().add_to(m)
return(m)
display(visualise_world_data_folium(df_useful["GDP"], 1985, True))
TODO, explain why (heavy tail, right skewed, power laws) + do it
For instance the distribution of GDP look a bit like a power law.
sns.distplot(df_useful["GDP"]["Value"], rug=False, hist=False)
#looks better with log scale
sns.distplot(np.log(df_useful["GDP"]["Value"]), rug=False, hist=False)
In this part, we will make one uniformized dataframe uni_df with the following columns.
Country | Year | GDP | Crops production columns | Livestock production columns | Crops importation columns | Livestock importation columns | Crops exportation columns | Livestock exportation | CPI
In this uniformized dataframe, a tuple (Country, Year) uniquely identifies a row.
need_pivot = ['Crops Area harvested',
'Crops Production',
'Crops Seed',
'Crops Yield',
'Livestock production',
'Live animals import quantities',
'Live animals export quantities',
'Crops import quantities',
'Crops export quantities']
def rename_columns(x, word):
if x not in ['Area', 'Year', 'ha', 'tonnes', 'hg/ha', 'Head', '1000 Head']:
return x + ' ' + word
return x
df_useful['GDP'] = df_useful['GDP'].rename(columns = {'Value':'(GDP, million $)'})[["Area",'Year','(GDP, million $)']]
df_useful['Consumer price indices'] = df_useful['Consumer price indices'].rename(columns = {'Value':'(Consumer price indices, %)'})[["Area",'Year','(Consumer price indices, %)']]
for df_name in need_pivot :
df_useful[df_name] = pd.pivot_table(df_useful[df_name], index=["Area",'Year'], columns=["Item","Unit"], values="Value").rename(columns=lambda x: rename_columns(x, df_name))
display(df_useful[df_name].sample(5))
# Deal with the NaN that appeared
for df_name in df_useful :
for column in list(df_useful[df_name]):
if column not in ['Area', 'Year']:
df_useful[df_name][column].fillna(df_useful[df_name][column].median(), inplace=True)
uni_df = df_useful['GDP']
for df_name in need_pivot :
uni_df = pd.merge(uni_df, df_useful[df_name], how='outer', on=['Area', 'Year'])
uni_df = pd.merge(uni_df,df_useful['Consumer price indices'], how='outer', on=['Area', 'Year'])
# Deal with the NaN that appeared
for column in list(uni_df):
if column not in ['Area', 'Year']:
uni_df[column].fillna(uni_df[column].median(), inplace=True)
uni_df.sample(30)
In this part, we will explore the dataset with more details. We will first see in more detail the food productions, importations and exportations, next examine the Consumer price indices, then study the structure of international trade and historical context and finally look at the economic classification of countries.
TODO
In this section we will present and compute the notion of food self-sufficiency.
One may wonder how to know whether a country produce all the food it needs or not. The notion of food-self-sufficency allows to answer to this question. More formally, it is a rate that decribes how much does a country can satisfy to meet its internal consumption needs by production. It describes the extent to which a country is able to feed its population through its domestic food production. We are interested into this measure since we think it could be correlated with the economic conditions of this country.
In order to compute the food self-sufficiency, we will apply the following formula that gives us the food self-sudfficiency as a percentage :
$$\frac{Production \times 100}{Production + Imports – Exports}$$all_columns = list(uni_df)
production_columns = []
import_columns = []
export_columns = []
for column in all_columns:
if (type(column)==tuple) and column[1]=='tonnes':
if 'export quantities' in column[0]:
export_columns.append(column)
elif 'import quantities' in column[0]:
import_columns.append(column)
elif 'Production' in column[0]:
production_columns.append(column)
uni_df[('All productions','tonnes')] = 0
for column in production_columns :
uni_df[('All productions','tonnes')] += uni_df[column]
uni_df[('All imports','tonnes')] = 0
for column in import_columns :
uni_df[('All imports','tonnes')] += uni_df[column]
uni_df[('All exports','tonnes')] = 0
for column in export_columns :
uni_df[('All exports','tonnes')] += uni_df[column]
uni_df[('food self-sufficiency','%')] = 100 * uni_df[('All productions','tonnes')] / (uni_df[('All productions','tonnes')]+uni_df[('All imports','tonnes')]+uni_df[('All exports','tonnes')])
display(uni_df[['Area','Year',('food self-sufficiency','%')]].sample(5))
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df[('food self-sufficiency','%')]
for year in range(1980, 2010, 5):
display(year, visualise_world_data_folium(plot, year, False))
Our dataset contains data for the historical period from 1970 to 2015. In order to be able to correctly interpret the results we are going to see, we first made an historical research on this period. We shortly listed below important events of this period for which we think they have had a significant influence on the agriculture and the economy.
There was the Cold war from 1945 to 1990 with two economic superpowers (USA and USSR). The USSR had been dissoluted in 1991. The Japanese economic miracle occured from 1945 to 1990 and allowed Japan to come out of the disastrous state in which it was at the exit of the WW2 and become world's second largest economy. There has been 2 big oil crisis in 1973 and 1979. There has been many wars (Middle East wars 1973-2000 e.g. Yom Kippur War 1973, Islamic Revolution in Iran 1979, Iran–Iraq war 1980-1988, Gulf war 1990-1991, Yugoslav wars 1991-2001...). We have already seen some consequences of such events by dealing with countries names in a previous section.
The third Agricultural Revolution (also known as Green revolution) occurs form 1960 to 1990 and imporved agricultural productions thanks to fertilizers and chemicals.
The following public-domain image from Wikimedia represents developed countries (blue), developing ones (orange) and least developed ones (red) according to the United Nations and International Monetary Fund. We expect to see similar results with our dataset (GDP).
![]()
The following image, also from Wikimedia shows the cumulative commercial balance for the period 1980-2008. We also expect to see similar results with our dataset, but there might be difference as we focus on agriculture.
![]()
plot = uni_df[['Area','Year']]
plot["Value"] = uni_df["(GDP, million $)"]
for year in range(1980, 2015, 5):
display(year, visualise_world_data_folium(plot, year, True))
Our results seem pretty intersting to share to the world. Moreover we have nice interactive maps and we would like to focus more on visual and style than writting on methodology. Therefore, we would like to produce a data story.